# Codebook: Teacher Testing Standards and the New Teacher Pipeline

## Replication Data for Law, Marks, and Stern (*Journal of Human Resources*)

**Harvard Dataverse DOI:** [10.7910/DVN/GDB8I0](https://doi.org/10.7910/DVN/GDB8I0)

**Last Updated:** April 2026

---

## Table of Contents

1. [Overview](#1-overview)
2. [File Inventory](#2-file-inventory)
3. [Key Concepts and Variable Definitions](#3-key-concepts-and-variable-definitions)
4. [Raw Data Files](#4-raw-data-files)
   - 4.1 [ETS Data](#41-ets-data)
   - 4.2 [Policy and Controls](#42-policy-and-controls)
   - 4.3 [Licenses](#43-licenses)
   - 4.4 [Teacher Shortages](#44-teacher-shortages)
   - 4.5 [Placebo Data](#45-placebo-data)
   - 4.6 [Title II Data](#46-title-ii-data)
   - 4.7 [Composite Treatment Files](#47-composite-treatment-files)
5. [Cleaned / Analysis-Ready Data Files](#5-cleaned--analysis-ready-data-files)
   - 5.1 [ets_treatment_data](#51-ets_treatment_data)
   - 5.2 [ipeds_data_cleaned](#52-ipeds_data_cleaned)
   - 5.3 [enrollment_event_data](#53-enrollment_event_data)
   - 5.4 [graduation_event_data](#54-graduation_event_data)
   - 5.5 [titleII_final_data](#55-titleii_final_data)
6. [Variable Naming Conventions](#6-variable-naming-conventions)
7. [Data Pipeline](#7-data-pipeline)
8. [Sample Construction](#8-sample-construction)
9. [Missing Data and Special Values](#9-missing-data-and-special-values)

---

## 1. Overview

This codebook documents every data file and variable in the replication package for "Teacher Testing Standards and the New Teacher Pipeline." The study examines how the 2013-2014 transition from the Praxis Pre-Professional Skills Test (PPST) to the harder Praxis Core Academic Skills for Educators affected teacher supply.

### Unit of Analysis

The primary analysis datasets are structured as follows:

| Dataset | Unit | Panel Structure | Years | N (approx.) |
|---------|------|----------------|-------|-------------|
| `ets_treatment_data` | State-year | 24 states x 13 years | 2008-2020 | 312 |
| `ipeds_data_cleaned` | Institution-year | ~1,700 institutions x 13 years | 2008-2020 | 20,117 |
| `enrollment_event_data` | Institution-year | 566 institutions x 6 biennial years | 2008-2018 | 3,201 |
| `graduation_event_data` | Institution-year | 568 institutions x 12 annual years | 2009-2020 | 6,384 |
| `titleII_final_data` | State-program-year | ~2,500 programs x ~10 years | 2012-2022 | 23,672 |

### States in Sample

The analysis sample consists of 22 jurisdictions (21 states + DC) that used the PPST and transitioned to Praxis Core. Two additional states (ND, TN) are included in some descriptive analyses (24 total in ETS treatment data). Oregon (OR) appears in the ETS data but is excluded from the main regression sample.

**22-state regression sample:** AK, AR, CT, DC, DE, HI, LA, MD, ME, MS, NC, NE, NH, NJ, NV, OR, PA, SC, VA, VT, WI, WV

**Additional states in ETS data:** ND, TN

---

## 2. File Inventory

### Raw Data (`data/raw/`)

| Folder | File | Format | Rows | Cols | Description |
|--------|------|--------|------|------|-------------|
| `ets/` | `cleaned ppst and core and composite.dta` | Stata | 441 | 13 | PPST and Praxis Core passing scores by state-year-subject |
| `ets/` | `states_we_add_back_in.xlsx` | Excel | 63 | 11 | Supplementary passing scores for ND, OR, TN |
| `ets/` | `PPST and Core Math.xlsx` | Excel | 51 | 16 | Raw math passing scores by state-year |
| `ets/` | `PPST and Core Reading.xlsx` | Excel | 51 | 16 | Raw reading passing scores by state-year |
| `ets/` | `PPST and Core Writing.xlsx` | Excel | 51 | 16 | Raw writing passing scores by state-year |
| `policy/` | `state_treatment.dta` | Stata | 288 | 34 | State-level economic and education policy controls |
| `licenses/` | `state_data_clean.xlsx` | Excel | 264 | 48 | State-year teacher license and completer data |
| `licenses/` | `stateyrlicensetradalt.dta` | Stata | 1,071 | 5 | Teacher licenses and completers by route (all 50 states + DC) |
| `shortages/` | `total_shortages_state_year.dta` | Stata | 301 | 4 | Teacher shortage areas by state-year |
| `placebo/` | `placebo_data.dta` | Stata | 5,784 | 83 | Completions by field of study (placebo outcomes) |
| `placebo/` | `placebo_enrollment_data.dta` | Stata | 2,898 | 112 | Non-education enrollment data (placebo) |
| `title_ii/` | `title_II_completer_clean.xlsx` | Excel | 12,326 | 54 | Title II teacher prep program completers |
| `title_ii_results/` | `title_II_graduations_event_study.xlsx` | Excel | 32 | 2 | Stata regression output: Title II event study |
| `composite_treatment/` | `enrollments_event_data.xlsx` | Excel | 3,201 | 124 | Enrollment panel with composite TDI treatment |
| `composite_treatment/` | `enrollments_event_data_binding.xlsx` | Excel | 3,201 | 125 | Enrollment panel with binding-subject TDI |
| `composite_treatment/` | `enrollments_event_data_math.xlsx` | Excel | 3,201 | 125 | Enrollment panel with math-only TDI |
| `composite_treatment/` | `enrollments_event_data_reading.xlsx` | Excel | 3,201 | 125 | Enrollment panel with reading-only TDI |
| `composite_treatment/` | `enrollments_event_data_writing.xlsx` | Excel | 3,201 | 125 | Enrollment panel with writing-only TDI |
| `composite_treatment/` | `graduation_event_data.xlsx` | Excel | 6,384 | 103 | Graduation panel with composite TDI treatment |
| `composite_treatment/` | `graduation_event_data_binding.xlsx` | Excel | 6,384 | 121 | Graduation panel with binding-subject TDI |
| `composite_treatment/` | `graduation_event_data_math.xlsx` | Excel | 6,384 | 103 | Graduation panel with math-only TDI |
| `composite_treatment/` | `graduation_event_data_reading.xlsx` | Excel | 6,384 | 103 | Graduation panel with reading-only TDI |
| `composite_treatment/` | `graduation_event_data_writing.xlsx` | Excel | 6,384 | 103 | Graduation panel with writing-only TDI |

### Cleaned Data (`data/cleaned/`)

| File | Format | Rows | Cols | Description |
|------|--------|------|------|-------------|
| `ets_treatment_data.xlsx` | Excel | 312 | 36 | State-year panel: TDI and event study variables |
| `ets_treatment_data.dta` | Stata | 312 | 50 | Same as above, with year interaction dummies for Stata |
| `ipeds_data_cleaned.xlsx` | Excel | 20,117 | 117 | Full IPEDS institution-year panel |
| `enrollment_event_data.xlsx` | Excel | 3,201 | 124 | Biennial enrollment analysis panel |
| `graduation_event_data.xlsx` | Excel | 6,384 | 103 | Annual graduation analysis panel |
| `titleII_final_data.xlsx` | Excel | 23,672 | 21 | Title II program-year panel |

---

## 3. Key Concepts and Variable Definitions

### Test Difficulty Index (TDI)

The TDI measures how stringent a state's teacher licensure testing requirements are, relative to the national test-taking population.

**Construction:**
1. Each state sets a passing score for each Praxis subject test (math, reading, writing)
2. ETS publishes the national mean and standard deviation of scores on each test
3. Z-score = (state passing score - national mean) / national SD
4. TDI = average of z-scores across math, reading, and writing

A higher TDI means a more stringent requirement. A z-score of 0 means the state's passing score equals the national average.

**Composite TDI:** Some states use a composite passing rule (e.g., sum of 3 subject scores must exceed a threshold, rather than requiring a minimum on each subject individually). The composite TDI adjusts for this by using `passingscore_composite` (typically the state's composite threshold divided by 3, which is lower than individual subject passing scores by approximately 3 points).

### DeltaTDI (Treatment Intensity)

DeltaTDI = TDI(2014) - TDI(2012)

This measures how much the testing requirement changed for each state when transitioning from PPST to Praxis Core. States with larger DeltaTDI experienced greater increases in testing stringency. This is the treatment variable in the regression analysis.

- Range: -0.16 to 1.10 (composite), -0.51 to 0.75 (non-composite)
- A negative DeltaTDI means the state actually became *less* stringent (rare)
- The variable `continuous_treat` in the data equals DeltaTDI

### Event Study Variables

The event study design interacts DeltaTDI with indicators for years before and after treatment:

- **lead5, lead3, lead1** (enrollment, biennial): DeltaTDI x pre-treatment period indicators
- **lead1 through lead5** (graduation, annual): DeltaTDI x pre-treatment year indicators
- **lag0 through lag7**: DeltaTDI x post-treatment period indicators
- **Reference period**: 2012 (last pre-treatment year). All lead/lag values for the reference period are zero.
- **Biennial structure**: In the enrollment data (even years only), even-numbered leads/lags are always zero because those periods don't exist in the data.

### Selectivity Classification

Institutions are classified as "more selective" or "less selective" based on the **2010 median** of SAT/ACT 25th percentile scores across all sample institutions. Institutions scoring **strictly above** the median are "more selective" (`selective = 1`). This classification is time-invariant (fixed at 2010 values).

- Baseline SAT scores stored in `satvr25_2`, `satmt25_2` (the `_2` suffix denotes baseline/imputed values)
- Baseline ACT scores stored in `actcm25_2`, `actcm75_2`

### ETS Test Codes

| Code | Test | Era |
|------|------|-----|
| 710 | PPST Reading | Old (pre-2014) |
| 720 | PPST Writing | Old (pre-2014) |
| 730 | PPST Math | Old (pre-2014) |
| 5712 | Praxis Core Reading | New (2013+) |
| 5722 | Praxis Core Writing | New (2013+) |
| 5732 | Praxis Core Math | New (2013+) |

### IPEDS CIP Codes

Completions and enrollments are tracked using the Classification of Instructional Programs (CIP):

- **CIP 2-digit code 13**: Education (broad category -- includes all education majors)
- **CIP 6-digit teacher preparation codes**: 13.01, 13.02, 13.03, 13.10, 13.12, 13.13, 13.14, 13.99

### Carnegie Classification

The `ccbasic` variable uses the Carnegie Classification of Institutions of Higher Education. Key values:

| Code | Classification |
|------|---------------|
| -3 | Not classified |
| 15 | Doctoral Universities: Very High Research Activity |
| 16 | Doctoral Universities: High Research Activity |
| 17 | Doctoral/Professional Universities |
| 18-21 | Master's Colleges & Universities |
| 22-23 | Baccalaureate Colleges |
| 24-33 | Special Focus and other categories |

---

## 4. Raw Data Files

### 4.1 ETS Data

#### `data/raw/ets/cleaned ppst and core and composite.dta`

**Source:** Educational Testing Service (ETS), compiled by authors
**Unit:** State x year x subject (biennial: 2008, 2010, 2012, 2014, 2016, 2018, 2020)
**Rows:** 441 | **Columns:** 13
**States:** 21 (AK, AR, CT, DC, DE, HI, LA, MD, ME, MS, NC, NE, NH, NJ, NV, PA, SC, VA, VT, WI, WV)

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `State` | string | 0 | 21 state abbreviations | State postal abbreviation |
| `ID` | string | 0 | e.g., "20080710" | Composite identifier: year (4 digits) + test code (3-4 digits) |
| `passingscore` | float | 23 | 130-178 | State-mandated minimum passing score for the test. Missing for state-year-subject combinations where the test was not required. |
| `year` | float | 0 | 2008, 2010, 2012, 2014, 2016, 2018, 2020 | Biennial observation year |
| `test` | integer | 0 | 710, 720, 730, 5712, 5722, 5732 | ETS test identification code. 710/720/730 = PPST (old); 5712/5722/5732 = Praxis Core (new). |
| `subject` | string | 0 | math, read, write | Subject area tested |
| `time` | string | 0 | old, new | Test era: "old" = PPST (pre-2014), "new" = Praxis Core (2013+) |
| `test_name` | string | 0 | old_read, old_write, old_math, new_read, new_write, new_math | Combined test era and subject identifier |
| `test_mean` | float | 0 | 154.9-178.0 | National mean score on this test form. Published by ETS. Used as denominator in z-score. |
| `test_sd` | float | 0 | 3.9-21.1 | National standard deviation of scores on this test form. Published by ETS. Used as denominator in z-score. |
| `z_score` | float | 23 | -1.30 to 0.16 | Standardized passing score: (passingscore - test_mean) / test_sd. Negative = below-average stringency; positive = above-average. Missing when passingscore is missing. |
| `eftotlt` | float | 0 | 475-28,598 | Total fall enrollment in education programs (from IPEDS). State-level aggregate. |
| `passingscore_composite` | float | 23 | 130-176 | Adjusted passing score for states using a composite passing rule. For composite states, this equals the composite threshold / 3 (approximately 3 points lower than the individual subject passing score). For non-composite states, equals passingscore. |

#### `data/raw/ets/states_we_add_back_in.xlsx`

**Unit:** State x year x subject (biennial)
**Rows:** 63 | **Columns:** 11
**States:** 3 (ND, OR, TN)

Same variable structure as the main ETS file above, but without `z_score` or `eftotlt`. These three states were initially excluded from the main ETS dataset but are added back during cleaning (script 01). ND and TN are excluded from the main regression sample; OR is included.

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `State` | string | 0 | ND, OR, TN | State abbreviation |
| `ID` | integer | 0 | 20080710-20205732 | Year + test code |
| `passingscore` | integer | 0 | 150-175 | State passing score |
| `year` | integer | 0 | 2008-2020 (biennial) | Year |
| `test` | integer | 0 | 710-5732 | ETS test code |
| `subject` | string | 0 | math, read, write | Subject |
| `time` | string | 0 | new, old | Test era |
| `test_name` | string | 0 | old_read...new_math | Combined test era + subject |
| `test_mean` | float | 0 | 154.9-178.0 | National mean score |
| `test_sd` | float | 0 | 3.9-21.1 | National standard deviation |
| `passingscore_composite` | integer | 0 | 148-175 | Composite-adjusted passing score |

#### `data/raw/ets/PPST and Core Math.xlsx`

**Unit:** State (all 50 + DC)
**Rows:** 51 | **Columns:** 16

Wide-format file with one row per state and columns for math passing scores across years. This is the raw source data from which the long-format `.dta` file was constructed.

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `stateab` | string | 0 | 51 entries | State abbreviation (some have annotation suffixes like & or *) |
| `passingscore2010XXXX` through `passingscore2020XXXX` | float/str | varies (24-50) | 130-178, or "dropped" | Passing score for that year and test. "dropped" means the state stopped requiring the test. Missing means the state did not use this test in that year. Column name encodes year + test code (e.g., `passingscore20100730` = PPST Math in 2010). |
| `Unnamed: 13`, `Unnamed: 14` | float | ~50 | mostly NaN | Artifact columns (unused) |
| `content of math test changed...` | string | 48 | notes | Annotation about test content changes in 2019-2020 |

#### `data/raw/ets/PPST and Core Reading.xlsx`

Same structure as the Math file, but for reading tests (codes 710/5712). Passing scores are more uniform across states (many states set 156 for Praxis Core Reading). Contains a `notes` column about composite score rules.

#### `data/raw/ets/PPST and Core Writing.xlsx`

Same structure as the Math file, but for writing tests (codes 720/5722). Writing passing scores range 158-176. Greater variation across states than reading.

---

### 4.2 Policy and Controls

#### `data/raw/policy/state_treatment.dta`

**Source:** Bureau of Labor Statistics (economic), Census Bureau (population), Kraft et al. (2020) (education policy), Chung and Zou (2023) (edTPA)
**Unit:** State-year (annual)
**Rows:** 288 | **Columns:** 34
**States:** 24 | **Years:** 2009-2020

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `year` | integer | 0 | 2009-2020 | Calendar year |
| `State` | string | 0 | 24 state abbreviations | State postal abbreviation |
| `statename` | string | 0 | Full state names | State name (e.g., "Alaska", "District of Columbia") |
| `shrinking_state` | integer | 0 | 0, 1 | Binary indicator: 1 if the state experienced declining school-age population over the sample period. Used for subsample analysis. |
| `real_income` | float | 0 | 41,312-92,266 | State-level real per-capita personal income (inflation-adjusted dollars). Source: BLS/Census. |
| `unemployment_rate` | float | 0 | 2.1-13.8 | State-level unemployment rate (percent). Source: BLS Local Area Unemployment Statistics. |
| `passevals` | integer | 0 | 0, 1 | Binary: 1 if the state has passed legislation requiring teacher performance evaluations in this year. Source: Kraft et al. (2020). |
| `implementevals` | integer | 0 | 0, 1 | Binary: 1 if the state has implemented teacher evaluation requirements in this year. Lags `passevals` by 1-2 years. |
| `eliminate_tenure` | integer | 0 | 0, 1 | Binary: 1 if the state has eliminated teacher tenure protections. |
| `increase_probationary_period` | integer | 0 | 0, 1 | Binary: 1 if the state has increased the probationary period for new teachers. |
| `weaken_bargaining` | integer | 0 | 0, 1 | Binary: 1 if the state has weakened teacher collective bargaining rights. |
| `eliminate_union_dues` | integer | 0 | 0, 1 | Binary: 1 if the state has eliminated mandatory teacher union dues. |
| `won_race_top` | integer | 0 | 0, 1 | Binary: 1 if the state won a federal Race to the Top grant. |
| `common_core` | integer | 0 | 0, 1 | Binary: 1 if the state has adopted Common Core State Standards. |
| `edtpa` | integer | 0 | 0, 1 | Binary: 1 if the state requires the edTPA performance assessment for teacher licensure. Source: Chung and Zou (2023). |
| `treatment_year` | integer | 0 | 2013 | Year of Praxis Core transition (constant across states). |
| `time_till` | integer | 0 | -4 to 7 | Years relative to treatment: year - treatment_year. Negative = pre-treatment; positive = post-treatment. |
| `year_2008`...`year_2020` | float | 0 | 0 or treatment intensity | Year x DeltaTDI interaction variables for Stata regression. The reference year (2012) is always 0. Other years equal `continuous_treat` if the observation is in that year, 0 otherwise. |
| `test_index` | float | 1 | -1.52 to -0.25 | Composite Test Difficulty Index for this state-year. Negative values because most states set passing scores below the national mean. |
| `test_index_lag` | float | 0 | -1.52 to -0.25 | test_index lagged by one year. Used in graduation regressions (Table 5) because graduates respond to the testing regime they faced at entry. |
| `continuous_treat` | float | 0 | -0.16 to 1.10 | DeltaTDI: change in composite TDI from 2012 to 2014 (post minus pre). This is the treatment intensity. Time-invariant within state. |
| `continuous_treatment_amount` | float | 120 | -0.16 to 1.10 | Same as `continuous_treat` but only populated for post-treatment years (2013+). Missing for pre-treatment years. Used in some TWFE specifications. |

---

### 4.3 Licenses

#### `data/raw/licenses/stateyrlicensetradalt.dta`

**Source:** Kraft and Lyon (2024)
**Unit:** State-year (annual)
**Rows:** 1,071 | **Columns:** 5
**States:** 51 (all 50 + DC) | **Years:** 2001-2021

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `year` | integer | 0 | 2001-2021 | Calendar year |
| `statename` | string | 0 | 51 full state names | State name |
| `licenses` | float | 55 | 0-35,195 | Total new teaching licenses issued by the state in this year |
| `alt_completers` | float | 2 | 0-35,544 | Completers of alternative route teacher preparation programs |
| `trad_completers` | float | 18 | 0-24,904 | Completers of traditional teacher preparation programs |

#### `data/raw/licenses/state_data_clean.xlsx`

**Source:** Constructed by authors, merging license data with state-level controls and treatment variables
**Unit:** State-year (annual)
**Rows:** 264 | **Columns:** 48
**States:** 22 | **Years:** 2009-2020

This file pre-merges license/completer data with state controls and treatment variables for the license event study (Figure 6). Variables are largely a subset of what appears in the main event study files, plus:

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `state_graduates` | integer | 0 | 196-14,047 | Total teacher education program graduates in the state |
| `licenses` | float | 23 | 194-29,727 | Total new teaching licenses issued |
| `trad_completers` | integer | 0 | 118-11,742 | Traditional route completers |
| `alt_completers` | integer | 0 | 0-2,027 | Alternative route completers |
| `log_graduates` | float | 0 | 5.28-9.55 | ln(state_graduates) |
| `log_licenses` | float | 23 | 5.27-10.30 | ln(licenses) |
| `log_trad_completers` | float | 0 | 4.77-9.37 | ln(trad_completers) |
| `log_alt_completers` | float | 32 | 1.79-7.61 | ln(alt_completers). Missing when alt_completers = 0. |
| `statefips` | integer | 0 | 2-55 | State FIPS code |
| `cohort_5_9` | integer | 0 | 26,634-760,840 | State population ages 5-9 (Census) |
| `cohort_10_14` | integer | 0 | 23,091-784,757 | State population ages 10-14 |
| `cohort_15_17` | integer | 0 | 15,193-513,799 | State population ages 15-17 |
| `population` | integer | 0 | 599,657-12,807,060 | Total state population |
| `pop_5_17` | integer | 0 | 68,406-2,058,783 | School-age population (5-17) |
| `percentage_change_08_18` | float | 0 | -10.46 to 8.85 | Percent change in school-age population from 2008 to 2018 |
| *(remaining variables)* | | | | Same policy controls, treatment variables, and year interactions as in `state_treatment.dta` |

---

### 4.4 Teacher Shortages

#### `data/raw/shortages/total_shortages_state_year.dta`

**Source:** U.S. Department of Education Teacher Shortage Area (TSA) database (tsa.ed.gov)
**Unit:** State-year (annual)
**Rows:** 301 | **Columns:** 4
**States:** 22 | **Years:** 2008-2021

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `statename` | string | 0 | 22 full state names | State name |
| `num_shortages` | float | 0 | 3-673 | Number of designated teacher shortage areas in the state for that year. A shortage area is a subject-grade combination where the state has identified an insufficient supply of qualified teachers. |
| `year` | integer | 0 | 2008-2021 | Calendar year |
| `ln_shortages` | float | 0 | 1.10-6.51 | ln(num_shortages). Pre-computed log for regression use. |

---

### 4.5 Placebo Data

#### `data/raw/placebo/placebo_data.dta`

**Source:** IPEDS completions by 2-digit CIP code
**Unit:** Institution-year (annual)
**Rows:** 5,784 | **Columns:** 83
**Institutions:** 511 | **Years:** 2009-2020

This file contains completions by field of study for each institution, used to construct placebo outcomes. If TDI affects teacher preparation specifically (not all higher education), we should see effects on education completions but not on other fields.

**Completion count variables (40 fields):**

| Variable | Type | Range | Description |
|----------|------|-------|-------------|
| `unitid` | integer | 102,553-495,767 | IPEDS institution ID |
| `year` | integer | 2009-2020 | Year |
| `agriculture_completions` | integer | 0-522 | Agriculture and related sciences (CIP 01) |
| `naturalresources_completions` | integer | 0-433 | Natural resources and conservation (CIP 03) |
| `architecture_completions` | integer | 0-272 | Architecture (CIP 04) |
| `genderstudies_completions` | integer | 0-418 | Area, ethnic, cultural, gender studies (CIP 05) |
| `commujournal_completions` | integer | 0-1,125 | Communication, journalism (CIP 09) |
| `commu_completions` | integer | 0-142 | Communication technologies (CIP 10) |
| `computer_completions` | integer | 0-3,643 | Computer and information sciences (CIP 11) |
| `culinary_completions` | integer | 0-50 | Culinary/personal services (CIP 12) |
| `education_completions` | integer | 0-3,917 | Education (CIP 13) -- NOT a placebo; included for comparison |
| `engin_completions` | integer | 0-3,029 | Engineering (CIP 14) |
| `engintech_completions` | integer | 0-418 | Engineering technologies (CIP 15) |
| `linguistics_completions` | integer | 0-606 | Foreign languages, literatures, linguistics (CIP 16) |
| `consumer_completions` | integer | 0-634 | Family and consumer sciences (CIP 19) |
| `legal_completions` | integer | 0-709 | Legal professions (CIP 22) |
| `english_completions` | integer | 0-1,118 | English language and literature (CIP 23) |
| `liberalarts_completions` | integer | 0-1,125 | Liberal arts and sciences (CIP 24) |
| `library_completions` | integer | 0-302 | Library science (CIP 25) |
| `biology_completions` | integer | 0-1,294 | Biological/biomedical sciences (CIP 26) |
| `math_completions` | integer | 0-576 | Mathematics and statistics (CIP 27) |
| `military_completions` | integer | 0-583 | Military science (CIP 28/29) |
| `interdisc_completion` | integer | 0-1,654 | Multi/interdisciplinary studies (CIP 30) |
| `parks_completions` | integer | 0-674 | Parks, recreation, leisure (CIP 31) |
| `philosophy_completions` | integer | 0-1,108 | Philosophy and religious studies (CIP 38) |
| `theology_completions` | integer | 0-2,080 | Theology (CIP 39) |
| `pscience_completions` | integer | 0-507 | Physical sciences (CIP 40) |
| `sciencetech_completions` | integer | 0-94 | Science technologies (CIP 41) |
| `psychology_completions` | integer | 0-2,302 | Psychology (CIP 42) |
| `security_completions` | integer | 0-1,806 | Homeland security, law enforcement (CIP 43) |
| `socialservice_completions` | integer | 0-1,997 | Public administration, social service (CIP 44) |
| `socialsciences_completions` | integer | 0-1,775 | Social sciences (CIP 45) |
| `construction_completions` | integer | 0-26 | Construction trades (CIP 46) |
| `mechanic_completions` | integer | 0-1 | Mechanic and repair (CIP 47) |
| `precision_completions` | integer | 0 (all zero) | Precision production (CIP 48) |
| `transpor_completions` | integer | 0-230 | Transportation (CIP 49) |
| `performing_completions` | integer | 0-804 | Visual and performing arts (CIP 50) |
| `clinical_completions` | integer | 0-2,738 | Health professions (CIP 51) |
| `business_completions` | integer | 0-6,647 | Business, management (CIP 52) |
| `history_completions` | integer | 0-619 | History (CIP 54) |

**Aggregate variables:**

| Variable | Type | Range | Description |
|----------|------|-------|-------------|
| `all_completions` | integer | 0-21,259 | Sum of all field completions |
| `noneducation_completions` | integer | 0-20,407 | all_completions minus education_completions. Primary placebo outcome. |
| `non_kraft_completions` | integer | 0-876 | Completions in fields not related to Kraft et al. policy reforms |

**Log-transformed variables:** Each of the above has a corresponding `l_` prefixed version (e.g., `l_noneducation_completions`). These are ln(x+1) transformations for regression use.

#### `data/raw/placebo/placebo_enrollment_data.dta`

**Unit:** Institution-year (biennial)
**Rows:** 2,898 | **Columns:** 112
**Institutions:** 509 | **Years:** 2008, 2010, 2012, 2014, 2016, 2018

This file is structured identically to the main `enrollment_event_data.xlsx` but includes non-education enrollment as the key outcome:

| Variable | Type | Range | Description |
|----------|------|-------|-------------|
| `eftotlt_educ` | integer | 0-11,356 | Total education program enrollment (fall) |
| `enrollment_total` | integer | 0-104,068 | Total institutional enrollment (all programs) |
| `non_ed_enrollment` | integer | 0-102,553 | enrollment_total minus eftotlt_educ. Primary placebo outcome. |
| `l_non_ed_enrollment` | float | 0-11.54 | ln(non_ed_enrollment) |
| `pa2018` | integer | 0, 1 | Pennsylvania x 2018 indicator (controls for PA test change) |
| `sc2018` | integer | 0, 1 | South Carolina x 2018 indicator (controls for SC test change) |

All other variables (completions, enrollment demographics, institutional controls, treatment variables, event study indicators) match the main enrollment_event_data.xlsx structure.

---

### 4.6 Title II Data

#### `data/raw/title_ii/title_II_completer_clean.xlsx`

**Source:** Title II Higher Education Act reporting system (title2.ed.gov), cleaned by authors
**Unit:** State x program x year (annual)
**Rows:** 12,326 | **Columns:** 54
**Programs:** 891 | **States:** 22 | **Years:** 2011-2020

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `state` | string | 0 | 22 full state names | State name |
| `program` | string | 0 | 891 unique programs | Teacher preparation program name (e.g., "University of Alaska Anchorage") |
| `programtype` | string | 0 | Traditional, Alternative, Combined | Program pathway. "Traditional" = 4-year undergraduate; "Alternative" = post-baccalaureate or non-traditional entry. |
| `ipeds_enrollment_year` | integer | 0 | 2010-2019 | Corresponding IPEDS enrollment year (Title II report year minus 2) |
| `ipeds_completion_year` | integer | 0 | 2011-2020 | Corresponding IPEDS completion year (Title II report year minus 1) |
| `totalenrollment` | integer | 0 | 0-6,270 | Total program enrollment |
| `maleenrollment` | integer | 0 | -6 to 1,681 | Male enrollment. **-6 = suppressed** (per Title II privacy rules, counts < 6 are suppressed). |
| `femaleenrollment` | integer | 0 | -6 to 4,417 | Female enrollment. -6 = suppressed. |
| `hispanicenrollment` | integer | 0 | -6 to 1,009 | Hispanic enrollment. -6 = suppressed. |
| `asianenrollment` | integer | 0 | -6 to 355 | Asian enrollment. -6 = suppressed. |
| `blackenrollment` | integer | 0 | -6 to 2,083 | Black enrollment. -6 = suppressed. |
| `whiteenrollment` | integer | 0 | -6 to 3,253 | White enrollment. -6 = suppressed. |
| `completerscurrent` | integer | 0 | 0-2,109 | Total program completers in the current year |
| `malecompleters` | integer | 0 | 0-391 | Male completers |
| `femalecompleters` | integer | 0 | 0-544 | Female completers |
| `asiancompleters` | integer | 0 | 0-159 | Asian completers |
| `blackcompleters` | integer | 0 | 0-136 | Black completers |
| `hispaniccompleters` | integer | 0 | 0-145 | Hispanic completers |
| `whitecompleters` | integer | 0 | 0-483 | White completers |
| `min_ugmingpaentry` | float | 6,799 | 0.0-3.2 | Minimum undergraduate GPA required for program entry. Missing for programs that do not report a GPA threshold (55% of obs). |
| `State` | string | 0 | 22 abbreviations | State abbreviation |
| `shrinking_state` | integer | 0 | 0, 1 | Binary: state has declining school-age population |
| `nonwhite_completers` | integer | 0 | 0-2,109 | Total non-white completers |
| `log_completerscurrent` | float | 0 | 0-7.65 | ln(completerscurrent + 1) |
| `log_whitecompleters` | float | 0 | 0-6.18 | ln(whitecompleters + 1) |
| `log_nonwhite_completers` | float | 0 | 0-7.65 | ln(nonwhite_completers + 1) |
| `program_f` | string | 0 | 891 unique | Program name (used as fixed effect identifier in regressions) |
| *(remaining columns)* | | | | Same policy controls (`passevals` through `edtpa`), treatment variables (`continuous_treat`, `test_index`, `test_index_lag`), and year interactions (`year_2011` through `year_2020`) as in other analysis files. |

#### `data/raw/title_ii_results/title_II_graduations_event_study.xlsx`

**Description:** Stata regression output table (not microdata). Contains 32 rows x 2 columns representing a formatted regression results table for the Title II completions event study (Figure A2).

| Content | Description |
|---------|-------------|
| Point estimates | Coefficients on year_2011 through year_2020 (omitting year_2012 as reference) |
| Standard errors | Clustered at state level, in parentheses |
| N = 5,554 | Number of program-year observations |
| Mean DV = 3.632 | Mean of log(completers) |
| FE: Program + Year | Fixed effects structure |

---

### 4.7 Composite Treatment Files

#### `data/raw/composite_treatment/`

This folder contains 10 pre-constructed event study panel files (5 enrollment + 5 graduation), each using a different TDI treatment measure. These are used for the alternative TDI regressions in Table 6.

**Enrollment files** (3,201 rows each, biennial 2008-2018, 566 institutions):
- `enrollments_event_data.xlsx` — composite TDI (average of 3 subjects)
- `enrollments_event_data_binding.xlsx` — binding (most restrictive) subject TDI
- `enrollments_event_data_math.xlsx` — math-only TDI
- `enrollments_event_data_reading.xlsx` — reading-only TDI
- `enrollments_event_data_writing.xlsx` — writing-only TDI

**Graduation files** (6,384 rows each, annual 2009-2020, 568 institutions):
- `graduation_event_data.xlsx` — composite TDI
- `graduation_event_data_binding.xlsx` — binding subject TDI
- `graduation_event_data_math.xlsx` — math-only TDI
- `graduation_event_data_reading.xlsx` — reading-only TDI
- `graduation_event_data_writing.xlsx` — writing-only TDI

These files share the same variable structure as the main `enrollment_event_data.xlsx` and `graduation_event_data.xlsx` (documented in Section 5.3 and 5.4), with the key difference being which `z_score` and `continuous_treat` values are used:

| File Suffix | Treatment Variable Source | Description |
|-------------|--------------------------|-------------|
| *(none)* / `_composite` | Average of math, reading, writing z-scores | Baseline specification |
| `_binding` | Most restrictive subject's z-score for each state | Tests whether the hardest test drives effects |
| `_math` | Math z-score only | Subject-specific robustness |
| `_reading` | Reading z-score only | Subject-specific robustness |
| `_writing` | Writing z-score only | Subject-specific robustness |

The `_binding` files contain additional variables:
- `subject_binding` (string): Identifies which subject is binding (math or write) for each state
- `bind_treat` (float): DeltaTDI using the binding subject's z-score
- `bind_comp_treat` (float): Composite treatment using the binding subject

---

## 5. Cleaned / Analysis-Ready Data Files

### 5.1 ets_treatment_data

**Files:** `data/cleaned/ets_treatment_data.xlsx` (36 cols) and `data/cleaned/ets_treatment_data.dta` (50 cols)
**Created by:** `code/01_clean_ets_data.R`
**Unit:** State-year (annual)
**Rows:** 312 | **States:** 24 | **Years:** 2008-2020

This is the state-level treatment file that provides the TDI and event study variables merged into all downstream analysis.

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `State` | string | 0 | 24 abbreviations | State postal abbreviation |
| `year` | integer | 0 | 2008-2020 | Calendar year (annual, including odd years filled forward from even-year data) |
| `treatment_year` | integer | 0 | 2013 | Year of PPST-to-Core transition (constant) |
| `time_till` | integer | 0 | -5 to 7 | Year - treatment_year |
| `test_index` | float | 0 | -1.16 to 0.10 | Non-composite TDI for this state-year. Changes over time as passing scores change. |
| `test_index_lead1` | float | 288 | -1.16 to 0.10 | TDI at time t+1. Only populated for the year before treatment (one obs per state). Used in graduation regressions. |
| `continuous_treat` | float | 0 | -0.51 to 0.75 | Non-composite DeltaTDI. Time-invariant within state. |
| `lead5` through `lead1` | float | 288 each | -0.51 to 0.75 | Event study pre-treatment indicators: equals `continuous_treat` for the observation in that lead period, NaN otherwise. In the .dta version, NaN is replaced with 0. |
| `lag0` through `lag7` | float | 288 each | -0.51 to 0.75 | Event study post-treatment indicators: equals `continuous_treat` for the observation in that lag period, NaN otherwise. |
| `test_index_composite` | float | 0 | -1.35 to 0.00 | Composite TDI for this state-year. Always <= 0 because composite scoring lowers the effective threshold. |
| `test_index_composite_lead1` | float | 288 | -1.35 to -0.25 | Composite TDI at time t+1 |
| `continuous_composite_treat` | float | 0 | -0.16 to 0.93 | Composite DeltaTDI. This is the primary treatment variable used in Tables 4-5 and Figures 3-4. |
| `composite_lead5`...`composite_lag7` | float | 288 each | -0.16 to 0.93 | Composite event study leads and lags |

**Stata-only additional variables (in .dta but not .xlsx):**

| Variable | Type | Values | Description |
|----------|------|--------|-------------|
| `time` | string | "new", "old" | Test era for the state in this year |
| `year_2008`...`year_2020` | float | 0 or treatment value | Year x treatment interaction dummies. year_2012 = 0 (reference). Used directly in Stata `reghdfe` specifications. |

---

### 5.2 ipeds_data_cleaned

**File:** `data/cleaned/ipeds_data_cleaned.xlsx`
**Created by:** `code/02_clean_ipeds_data.R`
**Unit:** Institution-year
**Rows:** 20,117 | **Columns:** 117
**Institutions:** ~1,705 | **Years:** 2008-2020

This is the comprehensive IPEDS dataset combining 10+ data topics. It covers all degree-granting institutions in the U.S. (not just the analysis sample states).

#### Identification Variables

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `unitid` | integer | 0 | 100,654-495,767 | IPEDS institutional unit ID. Unique identifier for each institution. |
| `year` | integer | 0 | 2008-2020 | Academic year |
| `inst_name` | string | 61 | text | Institution name (e.g., "University of Vermont") |
| `inst_alias` | string | 11,559 | text | Alternative name or abbreviation. Mostly missing. |
| `state_abbr` | string | 61 | 58 values | State/territory abbreviation (all 50 states + DC + territories) |
| `fips` | string | 61 | state names | State name corresponding to FIPS code |
| `county_fips` | float | 1,568 | varies | County-level FIPS code |
| `region` | string | 61 | 9 Census regions | Census Bureau region |

#### Teacher Preparation Completions (CIP-6 level)

18 variables tracking completions in teacher preparation programs (CIP codes 13.01, 13.02, 13.03, 13.10, 13.12, 13.13, 13.14, 13.99) at the BA and MA level. All integer type, 0 missing.

| Variable Pattern | Range | Description |
|------------------|-------|-------------|
| `ba_teacher_preparation_completions_{total,male,female,white,black,hispanic}` | 0-2,399 | BA-level teacher prep completions by demographic |
| `ma_teacher_preparation_completions_{total,male,female,white,black,hispanic}` | 0-5,363 | MA-level teacher prep completions by demographic |
| `ba_masters_teacher_preparation_completions_{total,male,female,white,black,hispanic}` | 0-7,762 | Combined BA+MA teacher prep completions by demographic |

#### Education CIP-2 Completions

Same 18-variable structure as teacher preparation, but using CIP code 13 (all education) rather than the narrower teacher prep CIP-6 codes.

| Variable Pattern | Range | Description |
|------------------|-------|-------------|
| `ba_educationcip2_{total,male,female,white,black,hispanic}_completions` | 0-2,399 | BA education completions |
| `ma_educationcip2_{total,male,female,white,black,hispanic}_completions` | 0-6,508 | MA education completions |
| `ba_masters_educationcip2_{total,male,female,white,black,hispanic}_completions` | 0-8,907 | Combined BA+MA education completions |

#### Total Completions

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `total_ba_completions` | float | 1,731 | 0-8,449 | Total BA completions across all fields |
| `total_ma_completions` | float | 4,404 | 0-7,935 | Total MA completions across all fields |
| `total_ba_masters_completions` | float | 4,619 | 0-16,384 | Total BA+MA completions across all fields |

#### Education Enrollment (18 variables)

All float type, ~9,552 missing (available for ~10,565 observations). These track enrollment in education programs from the IPEDS fall enrollment survey.

| Variable Pattern | Range | Description |
|------------------|-------|-------------|
| `all_{total,male,female,white,black,hispanic}_ed_enrollment` | 0-17,291 | All-level education enrollment by demographic |
| `undergrad_{total,male,female,white,black,hispanic}_ed_enrollment` | 0-8,987 | Undergraduate education enrollment |
| `graduate_{total,male,female,white,black,hispanic}_ed_enrollment` | 0-11,155 | Graduate education enrollment |

#### Institutional Characteristics (categorical)

| Variable | Missing | Values | Description |
|----------|---------|--------|-------------|
| `inst_affiliation` | 61 | Public; Private not-for-profit (religious); Private not-for-profit (no religious); Private for-profit | Institutional affiliation |
| `inst_control` | 61 | Public, Private not-for-profit, Private for-profit | Institutional control type |
| `inst_category` | 61 | 5 categories | Degree-granting category |
| `hbcu` | 61 | Yes, No | Historically Black College/University |
| `open_admissions_policy` | 61 | Yes, No, Not applicable | Open admissions |
| `reqt_test_scores` | 1,678 | Required, Recommended, Considered, Neither required nor recommended, Not applicable | Test score requirement for admission |
| `bach_offered` | 61 | Yes, No | Offers bachelor's degree |
| `masters_offered` | 61 | Yes, No, Not applicable | Offers master's degree |
| `teacher_cert` | 61 | Yes, No, Missing/not reported | Offers teacher certification |
| `teacher_cert_state_approved` | 61 | Yes, No, Missing/not reported | State-approved teacher cert program |
| `oncampus_housing` | 61 | Yes, No, Missing/not reported | Has on-campus housing |
| `rotc` | 61 | Yes, No, Missing/not reported | Has ROTC program |
| `ap_credit` | 61 | Yes, No, Missing/not reported | Accepts AP credit |
| *(and 15+ additional institutional characteristic variables)* | | | |

#### Admissions and Testing

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `sat_number_submitting` | float | 5,390 | 0-10,399 | Count of students submitting SAT scores |
| `sat_percent_submitting` | float | 5,421 | 0-100 | Percent of applicants submitting SAT |
| `act_number_submitting` | float | 5,384 | 0-8,999 | Count of students submitting ACT scores |
| `act_percent_submitting` | float | 5,422 | 0-100 | Percent of applicants submitting ACT |
| `sat_crit_read_25_pctl` | float | 6,465 | 210-720 | SAT Critical Reading 25th percentile |
| `sat_crit_read_75_pctl` | float | 6,465 | 260-790 | SAT Critical Reading 75th percentile |
| `sat_math_25_pctl` | float | 6,357 | 200-800 | SAT Math 25th percentile |
| `sat_math_75_pctl` | float | 6,357 | 325-800 | SAT Math 75th percentile |
| `act_composite_25_pctl` | float | 6,015 | 3-34 | ACT Composite 25th percentile |
| `act_composite_75_pctl` | float | 6,015 | 8-36 | ACT Composite 75th percentile |
| `enrollment_fall_fulltime_firsttime_undergrad` | float | 749 | 1-19,368 | Fall first-time full-time undergraduate enrollment |

#### Financial Aid and Outcomes

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `pell_percent` | float | 253 | 0-1 | Proportion of undergrads receiving Pell grants (0-1 scale) |
| `pell_average_amount` | float | 353 | 214-7,893 | Average Pell grant dollar amount |
| `student_faculty_ratio.x` | float | 1,767 | 1-107 | Student-to-faculty ratio |
| `full_time_undergrad_retention_rate` | float | 1,083 | 0-1 | First-year full-time retention rate (0-1 scale) |
| `completion_rate_150pct` | float | 16,101 | 0.006-1.0 | Graduation rate within 150% of normal time |

---

### 5.3 enrollment_event_data

**File:** `data/cleaned/enrollment_event_data.xlsx`
**Created by:** `code/04_merge_event_data.R`
**Unit:** Institution-year (biennial)
**Rows:** 3,201 | **Columns:** 124
**Institutions:** 566 | **Years:** 2008, 2010, 2012, 2014, 2016, 2018
**States:** 24

This is the primary analysis file for enrollment regressions (Table 4, Figure 3).

#### Identification

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `unitid` | integer | 0 | 102,553-489,937 | IPEDS institution ID |
| `year` | integer | 0 | 2008, 2010, 2012, 2014, 2016, 2018 | Biennial observation year |
| `name` | string | 0 | 673 unique | Institution name |
| `city` | string | 0 | 392 unique | City |
| `State` | string | 0 | 24 abbreviations | State abbreviation (uppercase variable) |
| `state` | string | 0 | 24 abbreviations | State abbreviation (lowercase variable; same values as `State`) |
| `statefips` | integer | 0 | 2-55 | State FIPS code |
| `statename` | string | 0 | full names | State name |
| `state_name` | string | 0 | full names | State name (alternate formatting; DC = "Washington DC") |

#### Dependent Variables: Education Enrollment (Fall)

These are CIP-13 education program enrollment counts from the IPEDS fall enrollment survey.

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `eftotlt` | integer | 0 | 0-11,356 | Total fall enrollment in education programs |
| `eftotlm` | float | 41 | 0-2,180 | Male education enrollment |
| `eftotlw` | float | 41 | 0-9,176 | Female education enrollment |
| `efbkaat` | integer | 0 | 0-2,080 | Black education enrollment |
| `efbkaam` | float | 41 | 0-539 | Black male education enrollment |
| `efbkaaw` | float | 41 | 0-1,759 | Black female education enrollment |
| `efhispt` | integer | 0 | 0-798 | Hispanic education enrollment |
| `efhispm` | float | 41 | 0-152 | Hispanic male education enrollment |
| `efhispw` | float | 41 | 0-646 | Hispanic female education enrollment |
| `efwhitt` | integer | 0 | 0-5,155 | White education enrollment |
| `efwhitm` | float | 41 | 0-1,028 | White male education enrollment |
| `efwhitw` | float | 41 | 0-4,181 | White female education enrollment |
| `efaiant` | float | 41 | 0-291 | American Indian/Alaska Native education enrollment |
| `efasiat` | float | 41 | 0-526 | Asian education enrollment |
| `ef2mort` | float | 41 | 0-339 | Two or more races education enrollment |
| `efunknt` | float | 41 | 0-3,203 | Unknown race education enrollment |

#### Dependent Variables: Log-Transformed

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `l_eftotlt` | float | 0 | 0-9.34 | ln(eftotlt + 1). **Primary enrollment outcome variable** (Table 4). |
| `l_efbkaat` | float | 0 | 0-7.64 | ln(efbkaat + 1) |
| `l_efwhitt` | float | 0 | 0-8.55 | ln(efwhitt + 1) |
| `l_efhispt` | float | 0 | 0-6.68 | ln(efhispt + 1) |
| `non_white` | integer | 0 | 0-6,201 | Total non-white enrollment (eftotlt - efwhitt) |
| `l_nonwhite` | float | 0 | 0-8.73 | ln(non_white + 1) |
| `black_hispanic` | integer | 0 | 0-2,546 | Black + Hispanic enrollment |
| `l_blackhipanic` | float | 0 | 0-7.84 | ln(black_hispanic + 1) |
| `l_efbkaat2` | float | 700 | 0-7.64 | ln(efbkaat + 1), alternate construction |

#### Dependent Variables: Completions

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `ctotalt` | integer | 0 | 0-2,847 | Total education completions |
| `ctotalm` | integer | 0 | 0-511 | Male completions |
| `ctotalw` | integer | 0 | 0-2,336 | Female completions |
| `casiat` | integer | 0 | 0-137 | Asian completions |
| `cbkaat` | integer | 0 | 0-310 | Black completions |
| `cbkaam` | integer | 0 | 0-67 | Black male completions |
| `cbkaaw` | integer | 0 | 0-254 | Black female completions |
| `chispt` | integer | 0 | 0-167 | Hispanic completions |
| `chispm` | integer | 0 | 0-32 | Hispanic male completions |
| `chispw` | integer | 0 | 0-135 | Hispanic female completions |
| `cwhitt` | integer | 0 | 0-1,055 | White completions |
| `cwhitm` | integer | 0 | 0-242 | White male completions |
| `cwhitw` | integer | 0 | 0-917 | White female completions |

#### Institutional Controls

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `ccbasic` | integer | 0 | -3 to 33 | Carnegie Classification (basic). See Section 3 for code values. |
| `carnegie` | integer | 0 | -3 to 60 | Carnegie Classification (detailed) |
| `satvr25` | float | 838 | 290-710 | SAT Verbal 25th percentile (time-varying) |
| `satmt25` | float | 824 | 250-750 | SAT Math 25th percentile (time-varying) |
| `actcm25` | float | 805 | 6-33 | ACT Composite 25th percentile (time-varying) |
| `test_optional` | integer | 0 | 0, 1 | Institution does not require standardized test scores |
| `satpct2` | integer | 0 | 0-100 | Percent submitting SAT (baseline, time-invariant) |
| `actpct2` | integer | 0 | 0-100 | Percent submitting ACT (baseline, time-invariant) |
| `satvr25_2` / `satvr75_2` | integer | 0 | 290-710 / 390-800 | SAT Verbal 25th/75th percentile (baseline) |
| `satmt25_2` / `satmt75_2` | integer | 0 | 250-750 / 390-800 | SAT Math 25th/75th percentile (baseline) |
| `actcm25_2` / `actcm75_2` | integer | 0 | 6-33 / 15-35 | ACT 25th/75th percentile (baseline) |
| `pell_percent2` | integer | 0 | 0-100 | Pell grant recipient percent (baseline) |
| `pell_amount2` | integer | 0 | 434-7,893 | Average Pell grant amount (baseline) |
| `loan_percent2` | integer | 0 | 0-100 | Student loan recipient percent (baseline) |
| `loan_average2` | integer | 0 | 426-21,539 | Average loan amount (baseline) |
| `enrollment_total2` | integer | 0 | 1-8,445 | Total enrollment (baseline) |

**Note on `_2` suffix:** Variables ending in `_2` are **baseline values** (typically from 2010 or the earliest available year) that are held constant across all years. These are used as time-invariant controls to avoid endogeneity from including time-varying institutional characteristics.

#### State-Level Controls

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `real_income` | float | 0 | 41,312-86,764 | State real per-capita income |
| `unemployment_rate` | float | 0 | 2.4-13.8 | State unemployment rate (%) |
| `cohort_5_9` | integer | 0 | 27,268-760,840 | State population ages 5-9 |
| `cohort_10_14` | integer | 0 | 23,874-784,144 | State population ages 10-14 |
| `cohort_15_17` | integer | 0 | 15,481-516,111 | State population ages 15-17 |
| `population` | integer | 0 | 591,833-12,807,060 | Total state population |
| `pop_5_17` | integer | 0 | 68,406-2,058,783 | School-age population (5-17) |
| `percentage_change_08_18` | float | 0 | -10.46 to 21.76 | Percent change in school-age pop, 2008-2018 |
| `pop_percentage_change` | float | 0 | -10.46 to 21.76 | Same as `percentage_change_08_18` |

#### Education Policy Controls (all binary 0/1)

| Variable | Description | Source |
|----------|-------------|--------|
| `passevals` | State passed teacher evaluation legislation | Kraft et al. (2020) |
| `implementevals` | State implemented teacher evaluations | Kraft et al. (2020) |
| `eliminate_tenure` | State eliminated teacher tenure | Kraft et al. (2020) |
| `increase_probationary_period` | State increased probationary period | Kraft et al. (2020) |
| `weaken_bargaining` | State weakened collective bargaining | Kraft et al. (2020) |
| `eliminate_union_dues` | State eliminated mandatory union dues | Kraft et al. (2020) |
| `won_race_top` | State won Race to the Top grant | Kraft et al. (2020) |
| `common_core` | State adopted Common Core standards | Kraft et al. (2020) |
| `edtpa` | State requires edTPA | Chung & Zou (2023). Always 0 in enrollment sample. |

#### Special Controls

| Variable | Type | Values | Description |
|----------|------|--------|-------------|
| `pa2018` | integer | 0, 1 | Pennsylvania x year=2018 indicator. Controls for PA changing its Praxis Math passing score in 2017 (from 150 to 142). |
| `sc2018` | integer | 0, 1 | South Carolina x year=2018 indicator. Controls for SC dropping Praxis Core requirement in 2019. |

#### Treatment Variables

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `ID` | integer | 0 | e.g., 20080720 | Year + test code composite |
| `passingscore` | float | 39 | 158-176 | ETS passing score for this state-year |
| `test` | integer | 0 | 720, 5722 | ETS test code (720=PPST Writing, 5722=Core Writing) |
| `subject` | string | 0 | "write" | Subject (writing used as reference subject in enrollment file) |
| `time` | string | 0 | "new", "old" | Test era |
| `test_name` | string | 0 | "new_write", "old_write" | Combined test era + subject |
| `test_mean` | float | 0 | 163.7, 175.5 | National mean for this test |
| `test_sd` | float | 0 | 3.9, 11.6 | National SD for this test |
| `z_score` | float | 342 | -1.15 to 0.13 | Z-score of passing score (non-composite, for this subject) |
| `passingscore_composite` | float | 39 | 158-175 | Composite-adjusted passing score |
| `z_score_composite` | float | 39 | -1.92 to -0.13 | Composite z-score |
| `test_index` | float | 39 | -1.52 to -0.25 | Composite TDI for this state-year |
| `test_index_noncomposite` | float | 342 | -1.16 to 0.10 | Non-composite TDI |
| `test_index_lead1` | float | 2,660 | -1.52 to -0.25 | TDI at t+1 (for lagged specifications) |
| `treat` | float | 2,660 | -0.16 to 1.10 | DeltaTDI (composite) -- only populated for first post-treatment obs |
| `continuous_treatment_amount` | float | 1,584 | -0.16 to 1.10 | DeltaTDI, only for post-treatment years |
| `continuous_treat` | float | 0 | -0.16 to 1.10 | **Primary treatment variable.** Composite DeltaTDI. Time-invariant within state. |
| `treatment_year` | integer | 0 | 2013 | Treatment year (constant) |
| `time_till` | integer | 0 | -5, -3, -1, 1, 3, 5 | Biennial periods relative to treatment |

#### Event Study Indicators

| Variable | Type | Values | Description |
|----------|------|--------|-------------|
| `lead5` | float | 0 or treatment value | DeltaTDI x I(year=2008). Pre-treatment, 5 years before. |
| `lead4` | integer | 0 | Always zero (reference/omitted biennial period) |
| `lead3` | float | 0 or treatment value | DeltaTDI x I(year=2010) |
| `lead2` | integer | 0 | Always zero (even-numbered leads omitted in biennial structure) |
| `lead1` | float | 0 or treatment value | DeltaTDI x I(year=2012). This is the last pre-treatment period. |
| `lag0` | integer | 0 | Always zero (reference period for post-treatment) |
| `lag1` | float | 0 or treatment value | DeltaTDI x I(year=2014). First post-treatment. |
| `lag2` | integer | 0 | Always zero |
| `lag3` | float | 0 or treatment value | DeltaTDI x I(year=2016) |
| `lag4` | integer | 0 | Always zero |
| `lag5` | float | 0 or treatment value | DeltaTDI x I(year=2018) |
| `lag6`, `lag7` | integer | 0 | Always zero (beyond sample) |
| `year_2008`...`year_2018` | float/int | 0 or treatment value | Year x treatment interactions. year_2012 = 0 (reference). |

---

### 5.4 graduation_event_data

**File:** `data/cleaned/graduation_event_data.xlsx`
**Created by:** `code/04_merge_event_data.R`
**Unit:** Institution-year (annual)
**Rows:** 6,384 | **Columns:** 103
**Institutions:** 568 | **Years:** 2009-2020
**States:** 24

This is the primary analysis file for graduation regressions (Table 5, Figure 4). It shares most variables with the enrollment file but has annual data and graduation-specific outcomes.

**Key differences from enrollment_event_data:**
- Annual (not biennial) -- 12 years instead of 6
- Dependent variables are completions, not enrollment
- Includes `selective` (binary) for selectivity subsample analysis
- Includes baseline selectivity controls: `satv_2008`, `satm_2008`, `act_2008`, `satv_2010`, `act_2010`
- `edtpa` can be 0 or 1 (unlike enrollment where it's always 0)
- Event study leads/lags are annual (lead1-lead5, lag0-lag7 all potentially non-zero)

#### Graduation-Specific Dependent Variables

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `ctotalt` | integer | 0 | 0-3,041 | Total teacher prep completions |
| `ctotalm` | integer | 0 | 0-511 | Male completions |
| `ctotalw` | integer | 0 | 0-2,548 | Female completions |
| `cbkaat` | integer | 0 | 0-443 | Black completions |
| `chispt` | integer | 0 | 0-218 | Hispanic completions |
| `cwhitt` | integer | 0 | 0-1,763 | White completions |
| `c2mort` | integer | 0 | 0-82 | Two or more races completions |
| `cunknt` | integer | 0 | 0-1,426 | Unknown race completions |
| `casiat` | integer | 0 | 0-140 | Asian completions |
| `l_ctotalt` | float | 0 | 0-8.02 | ln(ctotalt + 1). **Primary graduation outcome** (Table 5). |
| `l_cbkaat` | float | 0 | 0-6.10 | ln(cbkaat + 1) |
| `l_chispt` | float | 0 | 0-5.39 | ln(chispt + 1) |
| `l_cwhitt` | float | 0 | 0-7.48 | ln(cwhitt + 1) |
| `l_c2mort` | float | 0 | 0-4.42 | ln(c2mort + 1) |
| `l_cunknt` | float | 0 | 0-7.26 | ln(cunknt + 1) |
| `cnonwhite` | integer | 0 | 0-2,955 | Non-white completions |
| `l_cnonwhite` | float | 0 | 0-7.99 | ln(cnonwhite + 1) |
| `l_male` | float | 0 | 0-6.24 | ln(ctotalm + 1) |
| `l_female` | float | 0 | 0-7.84 | ln(ctotalw + 1) |

#### Selectivity Controls

| Variable | Type | Missing | Range | Description |
|----------|------|---------|-------|-------------|
| `selective` | integer | 0 | 0, 1 | Binary: 1 if institution's 2010 SAT/ACT 25th percentile > sample median |
| `satv_2008` | integer | 0 | 0-680 | SAT Verbal 25th percentile in 2008. 0 = not reported. |
| `satm_2008` | float | 1,699 | 290-680 | SAT Math 25th percentile in 2008 |
| `act_2008` | integer | 0 | 0-30 | ACT Composite 25th percentile in 2008. 0 = not reported. |
| `satv_2010` | integer | 0 | 0-670 | SAT Verbal 25th percentile in 2010 |
| `act_2010` | float | 1,716 | 12-30 | ACT Composite 25th percentile in 2010 |

---

### 5.5 titleII_final_data

**File:** `data/cleaned/titleII_final_data.xlsx`
**Created by:** `code/04_merge_event_data.R` or pre-constructed
**Unit:** State-program-year
**Rows:** 23,672 | **Columns:** 21
**Programs:** ~2,551 | **States:** 58 (all U.S. + territories) | **Years:** 2012-2022

| Variable | Type | Missing | Values/Range | Description |
|----------|------|---------|-------------|-------------|
| `state` | string | 0 | 58 state/territory names | State name (full) |
| `reportyear` | integer | 0 | 2012-2022 | Title II reporting year |
| `program` | string | 0 | 2,551 unique | Teacher preparation program name |
| `programtype` | string | 0 | Traditional, Alternative | Program pathway type. "Traditional" (68%) = undergraduate pathway; "Alternative" (32%) = post-baccalaureate or non-traditional. |
| `ipeds_enrollment_year` | integer | 0 | 2010-2020 | IPEDS-equivalent enrollment year (reportyear - 2) |
| `ipeds_completion_year` | integer | 0 | 2011-2021 | IPEDS-equivalent completion year (reportyear - 1) |
| `totalenrollment` | integer | 0 | 0-80,530 | Total program enrollment |
| `maleenrollment` | integer | 0 | -6 to 23,153 | Male enrollment. **-6 = suppressed** (Title II privacy rule). |
| `femaleenrollment` | integer | 0 | -6 to 49,249 | Female enrollment. -6 = suppressed. |
| `hispanicenrollment` | integer | 0 | -6 to 12,689 | Hispanic enrollment. -6 = suppressed. |
| `asianenrollment` | integer | 0 | -6 to 1,500 | Asian enrollment. -6 = suppressed. |
| `blackenrollment` | integer | 0 | -6 to 12,374 | Black enrollment. -6 = suppressed. |
| `whiteenrollment` | integer | 0 | -6 to 26,412 | White enrollment. -6 = suppressed. |
| `completerscurrent` | integer | 0 | 0-6,121 | Total completers in the current reporting year |
| `malecompleters` | integer | 0 | 0-1,789 | Male completers |
| `femalecompleters` | integer | 0 | 0-4,332 | Female completers |
| `asiancompleters` | integer | 0 | 0-201 | Asian completers |
| `blackcompleters` | integer | 0 | 0-1,235 | Black completers |
| `hispaniccompleters` | integer | 0 | 0-1,507 | Hispanic completers |
| `whitecompleters` | integer | 0 | 0-3,073 | White completers |
| `min_ugmingpaentry` | float | 4,764 | 0.0-3.5 | Minimum undergraduate GPA for program entry. Missing for ~20% of observations. Common values: 2.5, 2.75, 3.0. |

---

## 6. Variable Naming Conventions

### Prefix Conventions

| Prefix | Meaning | Example |
|--------|---------|---------|
| `l_` | Natural log transformation: ln(x + 1) | `l_eftotlt` = ln(total enrollment + 1) |
| `ef` | IPEDS fall enrollment | `efbkaat` = fall enrollment, Black, total |
| `c` | IPEDS completions | `ctotalt` = completions, total |
| `ba_` | Bachelor's degree level | `ba_teacher_preparation_completions_total` |
| `ma_` | Master's degree level | `ma_educationcip2_total_completions` |
| `log_` | Natural log (alternate convention) | `log_licenses` = ln(licenses) |

### Suffix Conventions

| Suffix | Meaning | Example |
|--------|---------|---------|
| `_2` | Baseline/time-invariant value (typically 2010) | `satvr25_2` = SAT Verbal 25th pctl, baseline |
| `_composite` | Uses composite TDI (average of 3 subjects) | `continuous_composite_treat` |
| `_lead1` | Lead of one period | `test_index_lead1` |
| `t` | Total (both genders) | `eftotlt` = enrollment, total |
| `m` | Male | `eftotlm` = enrollment, male |
| `w` | Female | `eftotlw` = enrollment, female |

### IPEDS Race Codes (in variable names)

| Code | Race/Ethnicity |
|------|---------------|
| `bkaa` | Black or African American |
| `hisp` | Hispanic or Latino |
| `whit` | White |
| `asia` | Asian |
| `aian` | American Indian or Alaska Native |
| `2mor` | Two or more races |
| `unkn` | Race/ethnicity unknown |

---

## 7. Data Pipeline

```
                   RAW DATA
                      |
    ┌─────────────────┼─────────────────┐
    |                 |                 |
 ETS Data        IPEDS API        Controls/Policy
 (data/raw/ets/) (downloaded)     (data/raw/policy/)
    |                 |                 |
    v                 v                 |
 01_clean_ets     02_clean_ipeds       |
    |                 |                 |
    v                 v                 |
 ets_treatment    ipeds_data           |
 _data.xlsx       _cleaned.xlsx        |
    |                 |                 |
    └────────┬────────┘                 |
             |                          |
             v                          |
         04_merge_event_data  <─────────┘
             |
    ┌────────┼────────┐
    |                 |
    v                 v
 enrollment_      graduation_
 event_data       event_data
    |                 |
    ├─────────────────┤
    |        |        |
    v        v        v
   03      05/06      07
 Tables   Regressions Figures
  1-3      4-7,A1-A2  1-7,A2-A3
```

---

## 8. Sample Construction

### Main Enrollment Sample (Table 4)
- Start with all IPEDS institutions in 24 PPST/Core states
- Restrict to institutions with education enrollment > 0 in at least one biennial year
- Biennial years: 2008, 2010, 2012, 2014, 2016, 2018
- Result: ~566 institutions, ~3,201 observations (2,882-2,896 in regression samples depending on variable availability)

### Main Graduation Sample (Table 5)
- Start with all IPEDS institutions in 24 PPST/Core states
- Restrict to institutions with education completions > 0 in at least one year
- Annual years: 2009-2020
- Result: ~568 institutions, ~6,384 observations (5,736-5,748 in regression samples)

### Title II Sample (Table 7)
- All teacher preparation programs in 22 sample states reporting to Title II
- Annual years: 2011-2020
- ~891 programs, ~12,326 observations in raw; ~5,554 in regression sample (after state restriction and data quality filters)

### Subsample Definitions

| Subsample | Definition | Used In |
|-----------|-----------|---------|
| All | Full regression sample | Tables 4-5 Col 1-2 |
| More Selective | `selective == 1` (above 2010 median SAT/ACT 25th pctl) | Tables 4-5 Col 3 |
| Less Selective | `selective == 0` (at or below 2010 median) | Tables 4-5 Col 4 |
| White | Outcome = `l_efwhitt` or `l_cwhitt` | Tables 4-5 Col 5 |
| Non-White | Outcome = `l_nonwhite` or `l_cnonwhite` | Tables 4-5 Col 6 |
| Shrinking | `shrinking_state == 1` (declining school-age pop) | Tables 4-5 Col 7 |
| Growing | `shrinking_state == 0` (stable/growing school-age pop) | Tables 4-5 Col 8 |

---

## 9. Missing Data and Special Values

### Missing Values

| Convention | Meaning | Where Used |
|-----------|---------|------------|
| NaN / blank | Standard missing (not applicable or not available) | All files |
| 0 (in lead/lag variables) | Not in this time period (structural zero, not missing) | Event study indicators in .dta files |
| -3 (Carnegie) | Not classified / not applicable | `ccbasic`, `carnegie` |

### Special Coded Values

| Value | Meaning | Where Used |
|-------|---------|------------|
| -6 | Suppressed for privacy (count < 6) | Title II enrollment variables (`maleenrollment`, `femaleenrollment`, race enrollments) |
| "dropped" | State stopped requiring this test | ETS raw Math/Reading/Writing xlsx files |
| 0 in SAT/ACT baseline variables | Not reported / test-optional | `satv_2008`, `act_2008` etc. (0 means the institution did not report, not that scores are zero) |

### Key Data Quality Notes

1. **PA Math Cutoff Change:** Pennsylvania changed its Praxis Math passing score from 150 to 142 in 2017. The code (01_clean_ets_data.R) and the `pa2018` indicator control for this.

2. **SC Praxis Core Dropped:** South Carolina dropped the Praxis Core requirement in 2019. The `sc2018` indicator controls for this.

3. **AR, CT, DE Fill-Forward:** Arkansas, Connecticut, and Delaware have missing passing scores for some post-2016 years. These are filled forward from the last observed value.

4. **ND, TN Exclusion:** North Dakota and Tennessee are in the ETS data (24 states) but excluded from the regression sample (22 states) because they did not fully transition to Praxis Core.

5. **Composite Scoring States:** Some states (e.g., CT, NJ, PA) allow candidates to pass with a combined score across subjects rather than requiring a minimum on each subject individually. The composite TDI adjusts for this (approximately 3 points lower per subject).

6. **Biennial vs Annual:** Enrollment data is biennial (even years 2008-2018) because IPEDS fall enrollment by CIP code is only available biennially. Graduation data is annual (2009-2020).

7. **Bootstrap Standard Errors:** All regression standard errors use pairs cluster bootstrap (B=1,000, seed=12345) clustered at the state level. Without bootstrap, standard errors will be systematically smaller (standard cluster-robust SEs).
